Skip to main content

Dynamo AI Resource Provisioning Guidelines

This reference outlines recommendations on resource provisioning for the DynamoAI platform based on expected feature utilization and workloads. It ensures optimal performance across various use cases and scenarios.

Scaling Considerations

Dynamo AI platform resource recommendations are based on the following metrics:

  • Throughput: Number of requests per second
  • Guardrails: Number of guardrails applied per moderation request (DynamoGuard)

Throughput Scenarios

Below, we provide resource requirements for different throughput scenarios, ranging from < 1 QPS to 100 QPS. For context, we typically observe production workloads ranging from 0.1 - 10 QPS in our customer's AI use cases, however Dynamo AI can support peak workloads exceeding 250 QPS.

Example: For an AI use case with 100k global users, a throughput of 10 QPS equates to approximately 8-12 queries per user per day.

ScenarioExpected Throughput (QPS)Use Cases
Development1 QPSTesting environments
Small5 QPSLightweight production scenarios
Medium10 QPSModerate-scale AI application
Large50 QPSHigh-demand production applications
Extra Large100 QPSEnterprise-scale, high-performance systems

Resource Guidelines

General Platform

Base platform resources are used for API and UI servers. The table below outlines recommended configurations:

ScenarioRecommended ResourcesExample Cloud-Specific Details
Developmentx32 vCPUs, 64GB memory- AWS: x8 c7i.xlarge
- Azure: x8 Standard_F4s_v2
- GCP: x8 c2d-standard-4
Smallx32 vCPUs, 64GB memory- AWS: x8 c7i.xlarge
- Azure: x8 Standard_F4s_v2
- GCP: x8 c2d-standard-4
Mediumx64 vCPUs, 128GB memory- AWS: x16 c7i.xlarge
- Azure: x16 Standard_F4s_v2
- GCP: x16 c2d-standard-4
Largex128 vCPUs, 256GB memory- AWS: x32 c7i.xlarge
- Azure: x32 Standard_F4s_v2
- GCP: x32 c2d-standard-4
Extra Largex128 vCPUs, 256GB memory- AWS: x32 c7i.xlarge
- Azure: x32 Standard_F4s_v2
- GCP: x32 c2d-standard-4

Note: This table is a general reference. You may need less resources than described in this table, because general platform components can run on the GPU nodes, using the shared vCPU and RAM. But provisioning this amount of resources guarantees the performance.

DynamoGuard Content Guardrails

DynamoGuard requires resources based on the number of guardrails applied to a workload. While CPUs can handle limited guardrails with higher latency, GPUs offer significantly improved latency (< 300ms). For lower latency when using CPUs, we recommend utilizing compute-optimized instances. Below, we provide the resource requirements for input content guardrails. For details around output content guardrails, please reach out to our team.

Tip: For non-development workloads, we recommend GPUs due to reduced latency and higher scalability.

Note: You should to calculate the number of GPUs you need based on your scenario and the number of policies you will deploy in the cluster.

ScenarioCPU OptionGPU OptionExample Cloud-Specific Instances
Developmentx8 vCPUs, 8GB memory per guardrailSame as Small scenario- AWS: c7i.xlarge
- Azure: Standard_F4s_v2
- GCP: c2d-standard-4
SmallNot Recommended1 A10G GPU per 10 guardrails- AWS: g5.2xlarge
- Azure: NV36ads_A10_v5
- GCP: g2-standard-8
MediumNot Recommended1 A10G GPU per 6 guardrails- AWS: g5.2xlarge
- Azure: NV36ads_A10_v5
- GCP: g2-standard-8
LargeNot Recommended1 A10G GPU per guardrail- AWS: g5.2xlarge
- Azure: NV36ads_A10_v5
- GCP: g2-standard-8
Extra LargeNot Recommended2 A10G GPUs per guardrail- AWS: g5.2xlarge
- Azure: NV36ads_A10_v5
- GCP: g2-standard-8

Data Generation

Data generation is a required step in the custom content policy creation. Dynamo AI supports several external and in-cluster model configurations for data generation. If required, contact Dynamo support for additional model providers.

OptionDescriptionCloud-Specific Details
Option 1Azure Llama 3.1-8B: azure_ai/Meta-Llama-3-1-8B-InstructN/A
Option 2AWS Llama 3.1-8B: bedrock/llama/us.meta.llama3-1-8b-instruct-v1:0N/A
Option 3GCP Llama 3.1-8B: llama-3.1-8b-instruct-maasN/A
Option 4 (less performant)In-cluster model: x1 A10G GPU + x8 vCPUs- AWS: x1 g5.2xlarge
- Azure: x1 NV36ads_A10_v5
- GCP: x1 g2-standard-8

Guardrail Fine-Tuning

Guardrail fine-tuning is a required step in the custom content policy creation. DynamoGuard offers two options for fine-tuning guardrails. Choose between SaaS fine-tuning or in-cluster fine-tuning based on your infrastructure.

OptionDescriptionCloud-Specific Details
Option 1Fine-tune on Dynamo SaaS environment and import policies into your clusterN/A
Option 2x1 A10G (or similar) GPU with x8 vCPUs, 32 GB RAM, and 24GB GPU memory- AWS: x1 g5.2xlarge
- Azure: x1 NV36ads_A10_v5
- GCP: x1 g2-standard-8

DynamoGuard Hallucination Guardrails

For hallucination guardrails, Dynamo supports both external and in-cluster configurations. If required, contact Dynamo support for additional model providers.

OptionDescriptionCloud-Specific Details
Option 1Azure-Open-AI-GPT-4o or Open-AI-GPT-4oN/A
Option 2In-cluster: Requires x3 A10G GPUs at 1 QPS. Additional GPUs scale linearly.- AWS: x3 g5.2xlarge
- Azure: x3 NV36ads_A10_v5
- GCP: x3 g2-standard-8

DynamoEval

DynamoEval requires the following resource configurations. The API endpoints are used for the data generation and judgement.

RequirementDescriptionCloud-Specific Details
API Endpointsmistral-small-latest, open-mistral-nemo, Open-AI-GPT-4oN/A
CPU and memoryx1 vCPU, 4GB memory- AWS: x1 c7i.xlarge
- Azure: x1 Standard_F4s_v2
- GCP: x1 c2d-standard-4